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Abstract 

We introduce a new distance and we use it to parameter estimation 
purposes. We observe how it operates and we use in its place the usual 
methods of estimation which we call the methods of the new approach. We 
realize that we obtain a discretization of the continuous case. Moreover, 
when it is necessary to consider truncated data nothing is changed in 
computations. 
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1 Introduction 

In the traditional approach of estimation there are three following basic ele- 
ments: a family of theoretical probability distributions, an empirical law and 
some estimation methods. We choose a method according to its properties and 
the problem at hand. The empirical distribution and the family of theoretical 
laws are datum of the problem whatever the method chosen. We propose a new 
viewpoint where the empirical law corresponding to a given theoretical one is 
perceived as being an empirical conditional distribution with the knowledge of 
the data. It becomes then an estimate of the conditional theoretical law know- 
ing the observations before being an estimation for the theoretical distribution 
from which it emanated. 

We introduce a new distance and we use it to estimate. We observe then 
how it operates and use in its place the usual methods of estimation which we 
call the methods of the new approach. We notice then that this leads to a unifi- 
cation of the methods of estimation since we do not make any more distinction 
between fixed type-I censored data and complete samples and between discrete 
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and continuous cases. Wc thus obtain a considerable lightening in the proce- 
dures of computation in estimation problems. The distinction in the traditional 
approach between truncated or type-I censored data and complete samples is 
not really justified since all samples are in fact truncated. Indeed, a sample is 
not truncated if it covers the totality of the support of the distribution from 
which it was drawn, if not it is truncated. Moreover it is natural to consider 
that the sample describes only the parts of the distribution which capture the 
data. The other parts are obtained by deduction. Also, the discretization for 
the continuous case obtained with the new approach is justified. Indeed, prac- 
tically all usual distributions can be reconstituted exactly starting from two or 
three points of their graphs. We can then estimate them starting from two or 
three points which represent their graphs empirically. In addition to the unifi- 
cation of several methods of estimation we note that the estimations with the 
new measure have the following specific properties. It does not require that the 
family of candidate theoretical distributions to be made up of the same type 
of laws. There is always a solution which will be acceptable in general. If the 
ratios of the frequencies of an empirical distribution coincide with those of the 
theoretical one from which it emanated then, from the first we can find the 
second with certainty. If the ratios of the frequencies of the empirical distri- 
bution coincide with those of the theoretical one which it best fits, then the 
estimations obtained are optimal in the sense that one cannot improve them. 
We checked also on some examples, analytically and numerically, that when we 
make tending the ratios of the frequencies of the empirical distribution towards 
those of the theoretical one, then the estimates tend towards the true param- 
eters. This last property implies convergence of the estimators. We prove the 
convergence of the estimators obtained with the new measure for a broad class 
of usual laws. Moreover, with the new measure we achieve more flexibility in 
computation compared to the method of maximum likelihood. 

We can distinguish in this paper three different parts. The first is on the 
subject of a new distance, presented in section 2. We can be interested and study 
it as a mathematical object without necessarily referring to its applications in 
statistics. That is a metric which does not have none equivalent in the theory 
of mathematics. We noted some of its remarkable properties, this promises 
new prospects. The second relates to the use of this distance in problems of 
estimation in statistics. That gives birth to a new method of estimate, presented 
in section 3. The study suggested in this part is not at all exhaustive. But the 
results obtained are already interesting and encouraging. The third part relates 
to a new approach of estimation. We can look at this new approach separately; 
this is the discretization of the methods of the continuous case. By adopting 
it we widen the field of application of the usual methods of estimation. It is 
presented in section 4. In sections 5 and 6 we gave using examples a practical 
illustration of the possibilities of the new method and the new approach of 
estimation. In section 7 we showed what the users of statistics gain immediately 
in the light of our work in comparison with the traditional approach. Lastly, in 
section 8 we gave in short a reminder of the whole of the results obtained. 
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2 A New Distance Between Probability Distri- 
butions 

In statistics, we use distances to measure the difference between probability 
distributions. Usually these distances are conceived in the same manner, the 
differences between distributions are almost always expressed by using variations 
in geometric sense between their graphs. We introduce a distance which operates 
differently. It is based on relativist properties of probability measures. But its 
interest is due especially to the fact that it is not equivalent to usual distances. 

Definition 1 Consider two probability measures P and Q defined on the same 
measurable space (il, J^), / and g being their respective probability distributions 
not necessarily with respect to the same measure and E an event from this space. 
We say that f and g have same variations on E, if their restrictions on E define 
the same probability measure on E equipped with the sigma algebra trace of T 
on E. 

Example 2 Let f be a density of a probability measure P and E an event such 
that P{E) > 0. The restriction of f on E and the conditional distribution of 
f with respect to E define the same probability measure on E and consequently 
they have the same variations on E. 

Example 3 Let f be a probability distribution and c a positive constant. The 
functions f and g ^ f + c have the same variations in the geometric sense but 
they do not have the same variations within the meaning of the above definition. 

Proposition 4 Let f and g be two probability distributions defined and positives 
on a part E not reduced to only one element. If in any point (x, y) of E x E, 
we have 

f{x)_^g{x)_ 

f{y) giy) ^ ' 

then f and g have same variations on E. 

Proof. If E is discrete the distribution generated by the restriction of / on 
E is fs = f /^xGE f(^) ^ ^^'^ fE = otherwise. If xq is in E such that 
g{xo) 7^ then ([T]) implies that for all x in E, f{x) — g{x)f{xo)/g{x). By 
replacing / in /^j, we find the conditional distribution generated by g on E. We 
obtain then the result. In the same way, we obtain the result for probability 
densities on R with respect to the Lebesgue measure on R when i? is a subset 
of M with positive probability. ■ 

Definition 5 Let f and g be two probability distributions and E an event on 
which they are strictly positive. If E is discrete and no reduced to only one 
element, we call distance in variations between f and g on E the quantity 

, / . ^ \r- fix) 9{x) 
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// E is an interval of R and, f and g are probability densities on R, with respect 
to Lebesgue measure n on R, we call distance in variations between f and g on 
E, the quantity 



dv{f,g)E = jj 



ExE 



f{x) g{x) 



f{y) 9{y) 



fj,{dx)n{dy). 



Note that d„ possesses the properties of symmetry and triangle inequahty. 
But in the identity property d^ (/, g)E = <=^ f = g onE, the equahty between 
/ and g must be understood in the sense that / and g have the same variations 
on E. 

Let d be the distance which measures the difference in two points x and y 
between two functions / and g by the quantity d{f,g) {x,y) = \f{x) — g{x)\ + 

\f{y)-9{y)\- 

Proposition 6 We have the following property for the distance dy : 
d{f,g)ix,y) = =^ dy{f,g){x,y) = 0, the converse is not always true. 

Proof. Follows directly from the definitions of d and dy. ■ 



3 New Method of Estimation 
3.1 Frequency Tables 

Let be a family of probability distributions. If it contains only one type of 
distribution we say that it is homogeneous otherwise we say that it is heteroge- 
neous. A heterogeneous family can be made up of several types of discrete and 
absolutely continuous distributions. Let us consider / in and some values 
yi,...,yk from its support. We call theoretical table of frequencies of / based 
on yi,...,yk or with support yi,...,yk the k couples (yi, /i) , (2/2, /2) , (yfe, /fe) 
where = /(j/i) / f{yj)i'^ = 1) 2, k. We note / the distribution defined 
by this table. We say that the precedent table completely characterizes the 
family if and only if there is a bijection between and = {f, / G In 
this case, theoretically, from / we can determine /. / will be a representative 
element of / in f'. We call f' the family of auxiliary distributions based on 
yi,...,yk associated to J^. We say also that the yi, i = 1,2,. . . ,k form a basis 
of observations which characterizes the family J^. 

Proposition 7 Let us consider two laws of probability f and g belonging to a 

family of distributions T and having the same support E. If F is a basis of 
observations which characterizes the family T then dy (/, g^p = implies that 
dvif,g)E = 0- 

Proof. If dy (/, g)p = then f — 'g where / and g are the auxiliary distributions 
of / and g respectively based on F. If in addition F constitutes a basis of 
observations characterizing T then, we deduce that f = g. ■ 
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It should be noted that none of the usual distances has this property and it 
is a key idea to justify the use of the methods of point estimation for discrete 
case in the continuous one. 

3.2 Estimation 

Let us consider k couples {yi, fi) , (j/fc, fk) of a table of empirical frequencies 
obtained after grouping the observations of a probability law belonging to a 
family of distributions with fi + f2 + ■■■ + fk = 1- It will be said that it 
empirically characterizes the family J- if the theoretical frequency table based 
on the yi, i = 1,2, k characterizes it too. In the sequel our starting point will 
be always, in the continuous as in the discrete table of empirical fre- 

quencies, based on k values yi, ■■■,yk, constituting a basis of observations which 
completely characterizes the studied family. We suppose that it is a datum of 
the problem and thus one docs not discuss the way of obtaining it, in particular 
in the continuous case. We can use for example procedures to select the optimal 
number of bins for a regular histogram (see for example Birge and Rozcnholc 
0). When we use the maximum likelihood procedure, theoretically nothing pro- 
hibits to estimate n parameters from a table of empirical frequencies, based on 
k values where k is lower or equal to n. But in practice we encounter sometimes 
difficulties which we do not expect. In certain cases we note that the results 
obtained are completely aberrant. We quote from the literature some paradoxes 
attached to the use of the maximum likelihood procedure in these cases ([3]). 
When we use tables of empirical frequencies whose basis characterizes the family 
of theoretical probability distributions which contains the distribution which we 
seek we avoid in advance these difficulties. We will indicate by / the discrete 
empirical distribution represented by this table. We notice that it is completely 
given if the ratios fi/ fj — f{yi)/f{yj) hj = 1, 2, fc are known and if / arises 
from a sample of a given theoretical distribution /, then from the law of large 
numbers f{yi)/f{yj) tends to f{yi)/f{yj) when the sample size tends to infin- 
ity. This result remains valid even when the support S represents a fixed type-I 
censored sample. When grouping in classes if one withdraws several classes and 
their frequencies, the frequencies of the remaining classes keep this property. 
Whether the sample considered is truncated or not and that the distribution 
from which it belongs is discrete or absolutely continuous, wc can measure the 
difference in variations between / and a theoretical distribution / in yi, 

by 

fj_ _ fiVi) 
fj fiVj) 

Since / converges in probability towards / then dy{f, f) converges in probability 
towards 0. 

Let us consider two probability distribution / and g which does not be- 
long necessarily to the same type of laws and not equal to zero in j/i, j/^ If 
dvUJ) (yi, ■■■,yk) < dy{f,g) (j/i, , we say that / is more close to / than 

to g, in the sense of dv. We thus define a new method of estimation. 



dv{f,f){yi,--;yk)= Y 

i.ietl k 
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Example 8 We simulated 10000 samples of size 100 from a binomial distribu- 
tion B{8, 0.1) and 10000 others from a B{15, 0.15). For each sample obtained we 
kept only the observations belonging to {0,1,2,3} with their frequencies. Then, 
starting from the empirical distribution thus defined we tried to identify the law 
simulated among the two binomial distributions considered. The correct distri- 
bution is selected for 98, 8% of cases when we used samples from the former and 
for 99, 43% of cases when from the latter. 

Example 9 We simulated 10000 samples of size 1000 from yV(1.2, 1.5) and we 
omitted the observations below the threshold 1.25. Each truncated sample was 
summarized into 11 classes. We selected between >V(1.2,1.5) and the Gamma 
distribution 0(2,0.5) using the metric d^- The distance dy has selected the cor- 
rect distribution, that is W(1.2, 1.5), 98.16%. 

Let us consider in a problem of estimation, a family of the theoretieal laws 
and an empirical distribution / with support yi, j/fe which constitutes a 

basis of observations characterizing J^. If it exists / belonging to such as 

dv{f, f) {yi, ■■■,yk) = 0, we say that / is an exact solution. 

Proposition 10 The exact solution, when it exists, is optimal in the sense that 
we cannot improve it. 

Proof. Indeed, in this case there is in a distribution whose table of frequencies 
coincides exactly with that of /, it is unique and it is /. ■ 

Criterion 11 (of quality) Let f be an empirical distribution and f the theo- 
retical one which best fits when we estimates by a given method. Ifdy{f,f) = 
then according to the preceding proposition the estimate obtained is optimal in 
the sense that it is the best possible improvement of the estimation. 

We have there a quality criterion when it holds, not only it supplants all 
the usual criteria but more since it gives a total and definitive guarantee of the 
optimality of the estimates. One will further show with examples that in some 
cases we can very easily find estimates possessing this property. We will also 
show by using examples that, when one makes tending dy{f,f) towards the 
differences between the estimates and the estimated values tend towards and 
at end one obtains their exact values. The latter property which remains to be 
proved in the general case implies immediately convergence of estimates. For 
the moment there is already the following result. 



3.3 Convergence in Probability of the Minimum Distance 
Estimator 

Let Xi, ...,Xn a sample with Xi ~ f{x,9), 9 = {0i, ...,9^)" G 6 C with 

fix, 9) = K{x) X exp !^Yl9kTk{x) + A{9) | , (2) 
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X e X CM., where A" is a Borel set of R such that X ^{x : f{x, 9) > 0} for all 

The family ([2]) is a large family of distributions, one finds there, for example, 
the family of the normal laws, and the family of the laws of Poisson. We assume 
that the support X does not depend on 9. Denote by 0„ the estimator by the 
minimum of metric dy between the empirical and theoretical distributions /„ 
(based on a sample of size n) and /(•, 9), that is 

9n = argmindi,(/(-,6l),/„). 

This estimator falls into the class of M-estimators. Using well known theorems 
on the convergence of M-estimators (see for example Amemiya [1]) we will prove 
that 9n converges in probability to the true parameter. 

Proposition 12 Let Xi,...,Xn be a sample from the family of distributions 
(0)- // the set of natural parameters Q is convex and the true parameter 9 is 
an interior point of Q, then the estimator 9n by the minimum of the distance of 
variations dy converges in probability to the true parameter 9, i.e., 



Proof. Since we search for a minimum of the criterion function dy, it suffices 
to show, under the assumptions of the family ([2]) and the convexity of the set 
O, that dy{9,x) seen as a function of is a convex function (see Amemiya [1]). 
Hence, this reduces the problem to the convexity of 



6,,{9) = 



For X,fj.eR with A + = 1, and S^^^ e 6, we have 
d^jiX9^^^ + fi9^^^) = 



a, exp [^(>k^ + ^'(^k^] - n{y,)) - A 



(3) 



where Cy = K{yi)/ K{yj) and assume that Cij > and Aij = f{yi)/f{yj) 
we have from the convexity of the exponential function that 



exp 



iniy,) - r,.(jy,))| < Aexp I^^^^W {n{yd - n{y,))j 

+A.exp|^^f {Tk{y^)-Tk{y,)) 



.k=l 



then 



a, exp I ^ [a0« + ] {Tkiyd - n{y,)) 
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XCij exp ] <^k^ (^fe(y') - TkiVj)) \ + MCi, exp \ ^ {n{yi) - n{yj)) 



.fe=i 



.fe=i 



(A + n) Aij < A 



a, exp i ^ 0« (Tfe(y,) - Tfc(2/,)) I - Aij 



.fe=i 



+ 



Ci, exp <^ ^ 0f {Tkivi) - Tkiyjj) \ - A, 



.fc=i 



Introducing the absolute value we get 



< A 



a, exp 1^ [a(^W + Ai^f ] (Tfe(y,) - rfe(y,)) \-{X + ^i) A, 
a, exp I Y (TkiVi) - niVj)) \ - A 



.fc=l 



Cij exp ^ (Tfe(yO - T,(y,)) I - A^,- 



.fc=i 



Hence 6ij{9) is a convex function of 9, which implies the convexity of dy {9,x) 
seen as a function of 9 and then the convergence in probability of the minimum 
of distance dy estimator. ■ 

4 New Approach of Estimation 

4.1 Foundation 

Let us consider in a problem of estimation the family of theoretical distri- 
butions T and an element / belonging to T. We have in an obvious way, 
dy{f, f) {yi, ■■■,yk) = dy{f, /) (yi, ...,yk) where / is the representative of / in T, 
T being the family of auxiliary distributions based on yi,...,yk, associated to 

/ is a discrete probability distribution with same support as / and depend 
on the same parameters of /. If the theoretical tabic of frequencies based on 
yi,-.-,yk characterizes completely the family then the determination of / is 
equivalent to the determination of /. When is homogeneous, for determining 
/, instead of dv wc can also make use of the usual methods (method of mo- 
ments, method of maximum likelihood, Bayesian Methods, ... etc.). Then they 
will be called the methods of the new approach. When proceeding in this way, 
all occurs as if one replaces the family of the theoretical distributions by the 
corresponding family !F. We note also what follows: 

1. In discrete case, if the usual methods of estimation are used it is as if one 
estimates in a traditional way starting from truncated samples. This supposes 
that it is considered that any sample which does not completely cover the sup- 
port of the distribution from which it is resulting is truncated in a deterministic 
way, the truncation being the parts which do not appear in the observations. 
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2. In continuous case, often in practice one associates with the sample of 
observations an optimal discrete distribution in a certain way and one uses 
it to estimate. Then when replacing dv by the usual methods we obtain a 
discretization of the continuous case. 

3. In discrete case / represents the conditional distribution of / knowing 
the observations y\,...,yk- In the continuous case / is calculated in a similar 
manner. It seems that there also it has the same interpretation except that this 
type of calculation does not exist in the theory of probability. 

For reason of coherence only with what has just been said in 1, 2 and 3, we 
propose to view the empirical distribution as being the conditional empirical 
distribution knowing the observations, since it is calculated knowing the obser- 
vations, even if that is not obvious in the continuous case. One then conceives 
it more easily as being an estimate of / before being for /. 

5 Analytical computation 

In this part we will organize a discussion around some very simple examples to 
try to reveal the specificity of the new approach and its contribution compared 
to the traditional one. Let us consider a table of frequencies based on two 
observations x and y with their respective frequencies ni and n2. Starting from 
such table, with the new method one can estimate only one parameter. Such 
table characterizes practically all the families of usual laws when one has to 
estimate only one parameter. We can obtain such a table when the sample 
considered is not truncated but of small size or is truncated and grouped in 
two classes only. In the light of the new distance we will see in the examples 
which follow that, according to whether one estimates only one parameter or 
two simultaneously, even if the sample is not of small size, it will be henceforth 
preferable to group it in two or three classes only because one can gain in the 
precision of the estimations. Indeed, the two or three points obtained have 
more weight to represent the theoretical points of the distribution which they 
describe empirically and the method of estimation with dv practically always 
gives in this case an optimal solution in the most general meaning. 

5.1 Estimation of the parameter of the exponential distri- 
bution 

Assume we want to estimate, from the preceding table, the probability density 
fx given by fx (x) = Xe~^'^ if a; > and f\{x) = otherwise, A > 0, and F 
denotes its cdf. 

a. Suppose it is a summary of a not truncated sample. Then the estima- 
tors of A by the methods of maximum likelihood of the classical approach A 
and the new one Xn are respectively: A = (ni + 712) / {nix + n2y) and Xn = 
(log (rii) — log(n2)) / [y — x) .As we can see, in general A is different from X^- 
When we compute A, the estimation obtained using dy, we find that it is equal 
to Ajv. A is here optimal in the general sense. If 
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?^2 f{y) 

then 

Xn (e) = a + ek 

k being a constant. Then Xn (e) tends towards A when e tends towards 0. We 
can check that the difference between A and A (e) does not tend towards when 
s tends towards 0. If the sample size tends towards infinity then, from the law 
of large numbers, the differences between the ratios of the empirical relative 
frequencies and those theoretical which their correspond tend towards and 
consequently Xn tends to A. But one can have these variations close to same 
for samples of finite sizes. It is noticed that the first solution here is always 
acceptable but the second not. The second is not acceptable only if there are 
anomalies in the sample of observations and then one is warned. We are not 
able to detect the sample deficiency from the first. The second is not acceptable 
when X < y and n2 < ni or conversely, but it is not what one expects, since 
the exponential law being decreasing, x < y we must have rii > n2- Now if in a 
problem the preceding exact solution is not acceptable and we have to propose 
an estimate of A, that is always possible with the new method. Put 



a (A) 



f{x) m 



f{y) "2 



fiy) n2 



and E = {a{X),X>0} 



f{x) m 

E is a. part of M which is bounded below by 0. It admits then a lower bound 

say ao- If ao is in E then there is Aq > such that a{Xo) = ag. In this 
case the estimation of A is Aq. If ao is not in E then, whatever the strictly 
positive integer n, there exists A > such that \a{X) — ao\ < 1/n. Put An = 
{X > 0/ \a{X) — ao| < . An is a decreasing sequence and then there exists 
Ao such that lim„^oo = ^o- In this case, each value A from Aq can be 
considered as an estimation of A with the new approach. 

b. Assume now that the table given is that of a fixed type-I censored data. 
For example in a not truncated grouped data one kept only the centers of two 
classes and their corresponding frequencies. With the new approach the table 
is enough and the solution is exactly the same as previously. But in this case 
the preceding estimate of the traditional approach is not valid here. One must 
use the methods of truncated data. One then needs the part of the support 
of / represented here by x and y. To be able to carry out calculations let 
us suppose that this table is the summary of the observations falling into the 
interval [0, c] with c > 0. That is a right truncated sample. We consider the 
observed likelihood 



F{c)) \F{c)J ■ 

We have to consider that ut observations are greater than c and have been 
discarded, but ut is unknown. In order to compute the complete likelihood we 
have to determine the conditional distribution of ut given that the observations 



10 



follows an exponential distribution to be able to implement the EM algorithm 
which require the computation of the conditional expectation of the complete 
log-likelihood function. It is then not possible to have an analytic solution and 
a recursive procedure is used to achieve a numerical solution. In general it is 
not always easy to use the method of maximum likelihood as let it believe the 
examples on the usual laws. Although Maximum likelihood estimators have 
good statistical properties in large samples, they often cannot be reduced to 
simple formulas, so estimates must be calculated using numerical methods. 

5.2 Estimation of the parameters of a normal distribution 

Let us consider a normal law N (m, a) . 

5.2.1 Estimation of the average 
Solving the following equation in m : 



we obtain 




It should be noted that rh is function of a. When solving precedent equation 
after replacing (ni/n2) by {f{x)/f{y)) + e, we obtain: 



m{e) = 




where lim fhie) = m. 

e— »0 



5.2.2 Estimation of the Variance 

Solving the following equation in <j, 

ni ^ {x-mf {y-mf 
n2 ~ 2^2 ^ 2a2 

we have: 

1. If ^ = 1 and —2mx + 2my + x"^ — y^ = 0,any value a belonging to R is 
solution. 

2. If ^ = 1 and ~2mx + 2my + — ^ q, there is no solution. 

3. If ^ 9^ l,one obtains: 
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(7 = 



21n- 



^^T— . 2mx In — — 2my In — — a;^ In — + 1/^ 



In 



n2 



If 

"2 



+ e, one obtains lim a (e) = cr. 



5.3 Remcirks 

1. As shown in the examples above, if there is a table of frequencies based 

on two observations and one estimates only one parameter, then with dv one 
easily obtains optimal estimates in the most general sense of the term. It 
is not always easy when the table is based on k observations yi,...,yk with 
fc > 3. If the table is thus formed and that wc cannot determine a total exact 
solution one proposes to take the various couples of possible observations in 
{yi,...,yk} and to determine the exact solution each time when it is possible 
and approached otherwise. Each estimation is weighted by the sum of the fre- 
quencies of the elements of the couple and we calculate their mean. For example 
in the case of the first example if there are exact solutions for the various cou- 
ples we take A = (l/ E [n.+nA E + We 

notice that here for each couple the estimation converges towards the true value 
when the differences between the ratios of the empirical relative frequencies and 
corresponding theoretical ones tend towards 0, then it is the same for the latter. 

2. In the first example we have obtained the same solution with dv and the 
method of maximum likelihood of the new approach. It is not an isolated case. 
We noted in various examples considered in this document, when we estimate 
only one parameter, they always give concordant results. 



6 Numerical Example 

Even in the discrete case the two approaches are different since, contrary to 
the traditional one, with the new we do not distinguish truncated samples from 
those not truncated. In traditional approach of truncated samples all parts of 
the support of the estimated distribution which are supposed to be observed are 
used in calculations through the conditional theoretical distribution. With the 
new one we use only the observations. Now, if we consider the samples which do 
not cover all the support of the distribution from which they emanated are trun- 
cated, the truncations being the parts which do not appear in the observations 
and we apply the traditional approach, we fall in the new one. For this reason we 
do not insist on the discrete case, we give only examples concerning the contin- 
uous case. It is not easy to present a comparative study of the numerical results 
of the two approaches, since to the same estimate of the new it corresponds two 
estimates of the traditional according to whether it is considered that the sample 
is truncated or not. In addition, in the traditional approach when the sample is 
truncated the nature of truncation is used in calculations. Then the frequency 
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table, without indication of the parts observed, is not enough. It is necessary at 
each time to indicate the intervals represented by the observations in the table. 
For all these reasons we present the estimates of the two approaches only when 
that makes better to underline the specificity of the new one. For example, 
we simulated synthetic data of size 400 from the standard normal distribution 
and we grouped them into 11 classes represented by the observations y\,...,yu 
and their frequencies. We obtain ys = —1.5331, i/e = 0.0386 and ys = 1.0863 
with their respective absolutes frequencies 713 = 23, rig = 89 and ng = 43. In 
the table presented hereafter, in the part before the line of ng we consider the 
two observations j/3 and r/g. The distance dv in these two points between the 
empirical distribution and the standard normal distribution is null as one takes 
ns = 27500 and ng = 89000. We fix then hq = 89000 and give ascending values 
for 713, more and more near to 27500 as indicated in the table and we estimate 
m when a is known and a when m is known. At each time we estimate them 
with the method of minimal distance with dv, the method of moments of the 
new approach and the method of maximum likelihood of the classical approach. 
We note estimates obtained with dv and with maximum likelihood of the new 
approach respectively by fn and rfiMnew for average and a and aMnew for the 
standard deviation and we note rhcLH a-nd ctclh those obtained with the clas- 
sical maximum likelihood procedure^ for tnuicatcxi sampk^s. For this last, the 
observed part is assumed to l)c [-1.7951, -1.2712[ U [-0. 22335. ().:-!()()55[ . 



2/3 = -1.5331, ye = 0.038690,2/8 = 1.0863,n6 = 89000 


n3 


23000 


24000 


26000 


27000 


27500 


ifi 


0.11369 


0.08661 


0.03568 


0.01167 


-0.000001 


ITT'Mnew 


0.11369 


0.08661 


0.03568 


0.01167 


-0.000001 


rhcLH 


0.110 75 


0.08444 


0.03478 


0.01128 


0.000155 


(T 


0.931 64 


0.946 64 


0.976 94 


0.992 28 


1.0 




0.93104 


0.94()()4 


0.976 94 


0.99228 


1.0 


^CLH 


0.92171 


0.93701 


0.967796 


0.98335 


0.991165 


ns 


43000 


44444 


47273 


48214 


49371 


ifi 


-0.02224 


-0.017549 


-0.00785 


-0.00762 


0.000002 


ITlMnew 


0.03676 3 


0.05190 7 


0.088443 


0.102 94 


0.0000005 


a 


0.91767 


0.935 46 


0.97180 


0.98716 


1.0 


^ Mnew 


1.068 9 


1.1080 


1.196 8 


1.242 


1 



In the part after the line of ng we estimate simultaneously m and a by the 

method of the minimal distance with dv and the method of moments of the new 
approach starting from the observations 2/3, ye and ys by fixing the frequency of 
ns = 89000 and while taking for ns and ne, the frequencies indicated. Then we 
observe what occurs when wc make tending the differences between the ratios of 
the empirical frequencies and the corresponding theoretical frequencies towards 
0. It is noticed that in the various examples considered, when we estimate only 
one parameter, the various methods of the new approach agree completely. But 
it is not the case when one estimates simultaneously two parameters. In the 
table above, when we estimate simultaneously m and a with the method of the 
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moments of the new approach or the method of minimal distance with dv, when 
the ratios of the empirical frequencies coincide exactly with the corresponding 
theoretical ones we obtain their exact values. But with the method of moments, 
as we can sec, the difference between the estimated parameters and their true 
values does not decrease necessarily when the difference between these ratios 
decreases as with the method of the minimal distance with dv. It seems that this 
property is specific to the estimation with dv. Here, in the various estimates with 
dv, at each time, the distance within the meaning of dv between the empirical 
distribution considered and the one to which it leads is null. Consequently the 
estimates with dv in that table are optimal in the most general meaning. 

7 Comparison of the two approaches 

A more thorough study is needed to compare the two approaches of estimation 
than only one section. But, by putting ourselves in the viewpoint of users of 
statistics, we can try to characterize what is achieved with the new approach at 
various levels. 

7.1 Procedures 

We place at disposal of statisticians all the usual methods of estimation and a 
new one. The remarkable fact with the new approach is that it occurs as if all is 
discrete except the need for grouping observations into classes in the continuous 
case, moreover, when it is necessary to consider fixed type-I censoring nothing 
change in computations. With this unification of several methods of estimation 
we obtain a considerable lightening of procedures compared to the traditional 
approach. 

7.2 Computations 

With the new approach, since all is discrete, there is no more the usual difficul- 
ties related to the integral calculus. With the method of maximum likelihood 
of the traditional approach or the new one, sometimes we encounter great dif- 
ficulties when one must estimate several parameters simultaneously. But with 
the method of the minimal distance with dy one can always easily propose an 
acceptable solution. 

7.3 Credibility of estimates. 

The statistician can now estimate with various methods, those of the traditional 
approach and of the new. If he obtains two different appreciable results it must 
decide for one of them. Usually we do not decide in this way since in the 
traditional approach we do not have criteria which give guarantees on a given 
specific evaluation. We have only criteria which give guarantees on average 
or asymptotically or by confidence interval. In this spirit, to make admitting 
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the new approach we should prove that it makes possible to obtain estimations 
better relatively to these criteria compared to those usually obtained. If one 
places itself in this spirit then, it is useless to continue because, for example, 
one cannot find better than the empirical average to estimate the average of 
the normal law. Of course nothing prevents us from also looking at the usual 
criteria in the new approach but there are new elements. One can henceforth in 
certain cases, without determining the estimator, affirming with certainty that 
the point estimation obtained with the new method is better than that obtained 
with maximum likelihood procedure. In other cases one can give estimators and 
without studying their properties one can affirm that one cannot improve them. 
Indeed, when the distance, within the meaning of dy, between a given empirical 
distribution and the theoretical one which best fits is null, the estimate obtained 
is optimal in the general sense. It is noticed that when the distance within the 
meaning of dy between a given empirical distribution and the one we obtain by 
the method of the minimal distance with d„ is not null, the solution obtained is 
regarded as optimal only within the meaning of the d^. In this case perhaps it 
is optimal in the most general sense what must then be specified. This question 
remains to be studied. 

8 Conclusion 

We introduced a new distance and we proposed an new approach of the estima- 
tion. 

1. The New distance. 

We introduced a new distance and we used it in parameter estimation where 
we noticed what follows. 

a. One can estimate even when the family of candidate theoretical distribu- 
tions is not homogeneous and there is always a solution which will be acceptable 
in general. 

b. Given a discrete empirical distribution associated to a sample belonging 
to a theoretical one, 

- If the ratios of frequencies of the first coincide with those of the second we 
found exactly the latter. 

- If the ratios of the frequencies of the first coincide with those of the the- 
oretical one which best fits, then the estimations obtained are optimal in the 
sense that one cannot improve them. 

- We showed on some examples that if we make tending the ratios of the 
frequencies of the first towards the corresponding theoretical ones of the sec- 
ond, then the estimations tend towards the true parameters. This implies im- 
mediately the convergence of the estimators. We showed the convergence in 
probability of the estimator for a broad class of usual laws. 

c. We introduced a quality criterion, when it holds, it is stronger than of 
checking all the usual criteria together and we showed on some examples that 
in certain cases we can determine easily estimations which check it. 
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In addition wc note a certain flexibility in calculations with dv compared to 
the method of the maximum likelihood. 
2. The New approach. 

We proposed an new approach of parameter estimation. When it is applied 
it works as if all is discrete except the need for grouping the observations in bins 
in continuous case. Since all is discrete there is no more the usual difficulties 
related to integral calculus, moreover, when it is necessary to consider fixed 
type-I censoring nothing is changed in computations. This unification of several 
methods of estimation leads to a lightening of the procedures compared to the 
traditional approach. 
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